Deep Learning for Biomedical Information Retrieval: Learning Textual Relevance from Click Logs
نویسندگان
چکیده
We describe a Deep Learning approach to modeling the relevance of a document’s text to a query, applied to biomedical literature. Instead of mapping each document and query to a common semantic space, we compute a variable-length difference vector between the query and document which is then passed through a deep convolution stage followed by a deep regression network to produce the estimated probability of the document’s relevance to the query. Despite the small amount of training data, this approach produces a more robust predictor than computing similarities between semantic vector representations of the query and document, and also results in significant improvements over traditional IR text factors. In the future, we plan to explore its application in improving PubMed search.
منابع مشابه
A Fast Deep Learning Model for Textual Relevance in Biomedical Information Retrieval
Publications in the life sciences are characterized by a large technical vocabulary, with many lexical and semantic variations for expressing the same concept. Towards addressing the problem of relevance in biomedical literature search, we introduce a deep learning model for the relevance of a document’s text to a keyword style query. Limited by a relatively small amount of training data, the m...
متن کاملWord2VisualVec: Cross-Media Retrieval by Visual Feature Prediction
This paper attacks the challenging problem of cross-media retrieval. That is, given an image find the text best describing its content, or the other way around. Different from existing works, which either rely on a joint space, or a text space, we propose to perform cross-media retrieval in a visual space only. We contribute Word2VisualVec, a deep neural network architecture that learns to pred...
متن کاملLearning SVM Ranking Function from User Feedback Using Document Metadata and Active Learning in the Biomedical Domain
Information overload is a well-known problem facing biomedical professionals. MEDLINE, the biomedical bibliographic database, adds hundreds of articles daily to the millions already in its collection. This overload is exacerbated by the lack of relevance-based ranking for search results, as well as disparate levels of search skill and domain experience of professionals using systems designed to...
متن کاملPriors in Web Search
Web search combines information obtained at query time with prior knowledge to form a posterior. This paper focuses on the prior, which we believe is interesting, given the poverty of the query stimulus (many of the web queries are no more than a word or two). We propose a learning framework based on the Noisy Channel Model for combining prior evidence from multiple sources including both the a...
متن کاملConcept drift detection in business process logs using deep learning
Process mining provides a bridge between process modeling and analysis on the one hand and data mining on the other hand. Process mining aims at discovering, monitoring, and improving real processes by extracting knowledge from event logs. However, as most business processes change over time (e.g. the effects of new legislation, seasonal effects and etc.), traditional process mining techniques ...
متن کامل